Dynamic Rank/Select Dictionaries with Applications to XML Indexing
نویسندگان
چکیده
We consider a central problem in text indexing: Given a text T over an alphabet Σ, construct a compressed data structure answering the queries char (i), rank s(i), and selects(i) for a symbol s ∈ Σ. Many data structures consider these queries for static text T [GGV03, FM01, SG06, GMR06]. We consider the dynamic version of the problem, where we are allowed to insert and delete symbols at arbitrary positions of T . This problem is a key challenge in compressed text indexing and has direct application to dynamic XML indexing structures that answer subpath queries [FLMM05]. We build on the results of [RRR02, GMR06] and give the best known query bounds for the dynamic version of this problem, supporting arbitrary insertions and deletions of symbols in T . Specifically, with an amortized update time of O((1/ )n ), we suggest how to support rank s(i), selects(i), and char (i) queries in O((1/ ) log log n) time, for any < 1. The best previous query times for this problem were O(log n log |Σ|), given by [MN06]. Our bounds are competitive with state-of-the-art static structures [GMR06]. Some applicable lower bounds for the partial sums problem [PD06] show that our update/query tradeoff is also nearly optimal. In addition, our space bound is competitive with the corresponding static structures. For the special case of bitvectors (i.e., |Σ| = 2), we also show the best tradeoffs for query/update time, improving upon the results of [MN06, HSS03, RRR02]. Finally, our focus on fast query/slower update is well-suited for a query-intensive XML indexing environment. Using the XBW transform [FLMM05], we also present a dynamic data structure that succinctly maintains an ordered labeled tree T and supports a powerful set of queries on T . ∗Department of Computer Sciences, Purdue University, West Lafayette, IN 47907–2066, USA ({agupta, wkhon, rahul}@cs.purdue.edu, [email protected]). 1
منابع مشابه
A Framework for Dynamizing Succinct Data Structures
We present a framework to dynamize succinct data structures, to encourage their use over non-succinct versions in a wide variety of important application areas. Our framework can dynamize most stateof-the-art succinct data structures for dictionaries, ordinal trees, labeled trees, and text collections. Of particular note is its direct application to XML indexing structures that answer subpath q...
متن کاملEfficient Dynamic Indexing and Retrieval of XML Documents using Three- Dimensional Quasi-BitCube
XML is a new standard for exchanging and representing data on the Internet. Techniques for indexing and retrieval of XML data is drawing increasing attention since they enable one to access certain parts of retrieved documents easily. However, they provide little or no support for adding new documents to an existing document collection, requiring instead that the entire collection be re-indexed...
متن کاملUpper and Lower Bounds for Text Upper and Lower Bounds for Text Indexing Data Structures
The main goal of this thesis is to investigate the complexity of a variety of problems related to text indexing and text searching. We present new data structures that can be used as building blocks for full-text indices which occupies minute space (FM-indexes) and wavelet trees. These data structures also can be used to represent labeled trees and posting lists. Labeled trees are applied in XM...
متن کاملIndexing and Querying Semistructured Data Views of Relational Database
The most promising and dominant data format for data processing and representing on the Internet is the Semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example relational data. XML queries differ from relational queries in that the former are expressed as path expressions. The efficien...
متن کاملبررسی انطباق الزامات ساختاری مجلات علوم پزشکی ایران با معیارهای مورد انتظار Pubmed Central
Introduction :In recent years, there is a growing trend in Iranian medical journals in terms of numbers. In order to be able to be included in international indexing databases, these journals should comply with the required criteria of these databases. So, the aim of this study was to determine the adaptation of Iranian medical journals with the structural criteria of PubMed central journal sel...
متن کامل